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Abstract 

The basic model for high-frequency data in finance is considered, 
where an efficient price process is observed under microstructure noise. 
It is shown that this nonparametric model is in Le Cam's sense asymp- 
totically equivalent to a Gaussian shift experiment in terms of the 
square root of the volatility function a. As an application, simple rate- 
optimal estimators of the volatility and efficient estimators of the in- 
tegrated volatility are constructed. 
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1 Introduction 

In recent years volatility estimation from high-frequency data has attracted a lot 
of attention in financial econometrics and statistics. Due to empirical evidence that 
the observed transaction prices of assets cannot follow a semi-martingale model, a 
prominent approach is to model the observations as the superposition of the true (or 
efficient) price process with some measurement error, conceived as microstructure 
noise. The main features are already present in the basic model of observing 

Yi= Xi/n + £i, i = l,...,n, (1.1) 

with an efficient price process Xt = a{s) dBs, B a standard Brownian motion, 
and £, ~ N{Q, 5'^) all independent. The aim is to perform statistical inference on the 
volatility function a : [0, 1] — >^ M"*", e.g. estimating the so-called integrated volatility 

(7^ (<) dt over the trading day. 

The mathematical foundation on the parametric formulation of this model has 
been laid by Gloter and Jacod (2001a) who prove the interesting result that the 
model is locally asymptotically normal (LAN) as n — )• oo, but with the unusual 
rate n~^/^, while without microstructure noise the rate is n~^/^. Starting with 
Zhang, Mykland, and Ai't-Sahalia (2005), the nonparametric model has come into 
the focus of research. Mainly three different, but closely related approaches have 
been proposed afterwards to estimate the integrated volatility: multi-scale estima- 
tors (Zhang 2006), realized kernels or autocovariances (Barndorff- Nielsen, Hansen, 
Lunde, and Shcphard 2008) and prcavcraging (Jacod, Li, Mykland, Podolskij, and 
Vetter 2009). Under various degrees of generality, especially also for stochastic 
volatility, all authors provide central limit theorems with convergence rate n~^/^ 
and an asymptotic variance involving the so-called quarticity o'^(i) dt. Recently, 
also the problem of estimating the spot volatility <j'^{t) itself has found some interest 
(Munk and Schmidt-Hieber 2009). 

The aim of the present paper is to provide a thorough mathematical understand- 
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ing of the basic model, to explain why statistical inference is not so canonical and 
to propose a simple estimator of the integrated volatility which is efficient. To this 
end we employ Le Cam's concept of asymptotic equivalence between experiments. 
In fact, our main theoretical result in Theorem 16.21 states under some regularity 
conditions that observing (Yi) in is for n — > cx3 asymptotically equivalent to 
observing the Gaussian shift experiment 

dYt = ^/2a{t) dt + dWt, te[0,l], 

with Gaussian white noise dW. Not only the large noise level S^^'^n~^^^ is appar- 
ent, but also a non-linear cr(i)-form of the signal, from which optimal asymp- 
totic variance results can be derived. Note that a similar form of a Gaussian shift 
was found to be asymptotically equivalent to nonparametric density estimation 
(Nussbaum 1996). A key ingredient of our asymptotic equivalence proof are the 
results by Grama and Nussbaum (2002) on asymptotic equivalence for generalized 
nonparametric regression, but also ideas from Carter (2006) and Reifi (2008) play 
a role. Moreover, fine bounds on Hellinger distances for Gaussian measures with 
different covariance operators turn out to be essential. 

Roughly speaking, asymptotic equivalence means that any statistical infer- 
ence procedure can be transferred from one experiment to the other such that 
the asymptotic risk remains the same, at least for bounded loss functions. Techni- 
cally, two sequences of experiments S'" and defined on possibly different sample 
spaces, but with the same parameter set, are asymptotically equivalent if the Le 
Cam distance A((?",^") tends to zero. For S", = ( J",, (Pj,)i,ge), « = 1,2, by 
definition, A{S'i,S'2) = max{6{S'i, S'2), 5{S'i, S'2)) holds in terms of the deficiency 
S{(ai,S'2) — sup^^q\\M — P^\\tv, where the infimum is taken over all ran- 
domisations or Markov kernels M from (,^1,^1) to (<!?^2,^2), see e.g. Le Cam 
and Yang (2000) for details. In particular, S{S'i,S'2) — means that (fi is more 
informative than S'2 in the sense that any observation in S'2 can be obtained from 
Si, possibly using additional randomisations. Here, we shall always explicitly con- 
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struct the transformations and randomisations and we shall then only use that 
A((?i, (02) ^ sup^gQljP^ ^ PHItv holds when both experiments are defined on the 
same sample space. 

The asymptotic equivalence is deduced stepwise. In Section [2] the regression- 
type model (jl.ip is shown to be asymptotically equivalent to a corresponding white 
noise model with signal X. Then in Section |31 a very simple construction yields 
a Gaussian shift model with signal log((T^(») + c), c > some constant, which is 
asymptotically less informative, but only by a constant factor in the Fisher informa- 
tion. Inspired by this construction, we present a generalisation in Section [?] where 
the information loss can be made arbitrarily small (but not zero), before applying 
nonparametric local asymptotic theory in Section [5] to derive asymptotic equiva- 
lence with our final Gaussian shift model for shrinking local neighbourhoods of the 
parameters. Section [6] yields the global result, which is based on an asymptotic 
sufficiency result for simple independent statistics. 

Extensions and restrictions are discussed in Section [7] before we use the theoret- 
ical insight to construct in Section |8] a rate-optimal estimator of the spot volatility 
and an efficient estimator of the integrated volatility by a locally-constant approx- 
imation. Remarkably, the asymptotic variance is found to depend on the third 
moment cr'^(i) dt and for non-constant cr^(») our estimator outperforms previous 
approaches applied to the basic model. Constructions needed for the proof are pre- 
sented and discussed alongside the mathematical results, deferring more technical 
parts to the Appendix, which in Section 19.11 also contains a summary of results on 
white noise models, the Hellinger distance and Hilbert-Schmidt norm estimates. 

2 The regression and white noise model 

In the main part we shall work in the white noise setting, which is more intuitive 
to handle than the regression setting, which in turn is the observation model in 
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practice. Let us define both models formally. For that we introduce the Holder ball 

:= {/ e C"([0, 1]) I ||/||c= ^ R} with ||/||c» = ll/lloo + sup '^^.''^"{y . 

2.1 Definition. Let Sq = SQ(n,6,a,R,g^) with n e^, 5 > Q, a e (0, 1), R> Q, 

^ be the statistical experiment generated by observing (jl.ip . The volatility 
belongs to the class 

y{a,R,a^) :^ {a^ (R) mm^ (t) ^ ct^ | . 

Let (ai = a, R,^) with £ > 0, a G (0, 1), i? > 0, ct^ ^ fee the statistical 
experiment generated by observing 

dYt=Xtdt + edWt, te[0,l], 



with Xt = cr(s) dBs as above, independent standard Brownian motions W and 
B anda^ G y{a,R,a^). 

From Brown and Low (1996) it is well known that the white noise and the 
Gaussian regression model are asymptotically equivalent for noise level e = \pn, — >■ 
as n — cx), provided the signal is /3-H61der continuous for /3 > 1/2. Since Brownian 
motion and thus also our price process X is only Holder continuous of order /? < 1/2 
(whatever a is), it is not clear whether asymptotic equivalence can hold for the 
experiments Sq and S\. Yet, this is true. Subsequently, we employ the notation 
An < Bn if An = 0{Bn) and An ~ Bn if An < Bn as well as Bn < An and obtain: 

2.2 Theorem. For any a > 0, ^ and S,R > the experiments (ob and S\ 
with £ = (5/-\/n are asymptotically equivalent ; more precisely: 



A{^o{n,S,a,R,a^),S'i{5/V^,h,a,R,a^)) < RS' 



Interestingly, the asymptotic equivalence holds for any positive Holder regu- 
larity a > 0. In particular, the volatility could be itself a continuous semi- 
martingale, but such that X conditionally on remains Gaussian. As the proof in 
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Section 19.21 of the appendix reveals, we construct the equivalence by rate-optimal 
approximations of the anti-derivative of which lies in C^+". Similar techniques 
have been used by Carter (2006) and Reifi (2008), but here we have to cope with 
the random signal for which we need to bound the Hilbcrt-Schmidt norm of the 
respective covariance operators. Note further that the asymptotic equivalence even 
holds when the level of the microstructure noise S tends to zero, provided S^n°' — oo 
remains valid. 

3 Less informative Gaussian shift experiments 

From now on we shall work with the white noise observation experiment (oi , where 
the main structures are more clearly visible. In this section we shall find easy 
Gaussian shift models which are asymptotically not more informative than (Si, but 
already permit rate-optimal estimation results. The whole idea is easy to grasp 
once we can replace the volatility by a piecewise constant approximation on 
small blocks of size h. That this is no loss of generality, is shown by the subsequent 
asymptotic equivalence result, proved in Section [9.31 of the appendix. 

3.1 Definition. Let S2 — (^2{£,h,a, R,a^) be the statistical experiment generated 
by observing 

dYt = X^dt + edWt, ie[0,l], 

with — (7{[s\h) dBs, [s\h '■= [s/h\h for h > and h^^ G N, and indepen- 
dent standard Brownian motions W and B. The volatility belongs to the class 
■y{a,R,a^). 

3.2 Proposition. Assume a > 1/2 and > 0. Then for e — > 0, /i" = o(£^/^) the 
experiments Si and S2 are asymptotically equivalent ; more precisely: 

A{Si{e,a,R,a^),S'2{e,h,a,R,a^)) < RoT^'^h'^e-^''^ . 

In the sequel we always assume ft," = o(e^/^) to hold such that we can work 
equivalently with (§2- Recall that observing Y in a white noise model is equivalent 
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to observing (/ e„i dY)m^i for an orthonormal basis (e„i)m^i of L^([0, 1]), cf. also 
Subsection 19 . 1 1 below . Our first step is thus to find an orthonormal system (not a 
basis) which extracts as much local information on as possible. For any (p e 
i^([0, 1]) with ||(p||l2 = 1 we have by partial integration 

ip{t)dYt = [ if{t)Xl' dt + e ( if{t) dWt 
Jo Jo 

= $(1)X^-$(0)X^-^ <i>{t)a{[t\h)dBt+e J ^{t) dWt 

= [J^ <fHt)a\lt\h)dt + e^y^\^ (3.1) 

where $(t) = — ip{s) ds is the antiderivative of (p with $(1) = and (^p ^ N{0, 1) 
holds. To ensure that $ has only support in some interval [kh, (k + l)h], we require 
ip to have support in [kh, (k + l)h] and to satisfy / (p{t) dt ~ 0. The function if]^ 
with supp((^A;) — [kh,{k + l)h], \\ipk\\L^ — 1, J (pk{t)dt = that maximizes the 
information load / ^1{t) dt for a'^{kh) is given by (use Lagrange theory) 

^k{t)^V2h-^/^cos{Tr{t-kh)/h)l[kh^(^k+i)h]{t), te[0,l]. (3.2) 

The L^-orthonormal system (ipk) for fc = 0, 1, . . . , ft.^^ — 1 is now used to construct 
Gaussian shift observations. In S'2 we obtain from (|3.ip the observations 

Vk J Mt) dYt - (h'Ti-^a^ikh) + e^) '^'Cfe, fc = 0, . . . , /i"! - 1, (3.3) 

with independent standard normal random variables (Cfe)fc=o,...,/i-i-i- Observing 
{Uk) is clearly equivalent to observing 

Zfe := log(2/2/i-2^2) _ E[log(C|)] = log (a^{kh) + s^h-^i:^) + 77, (3.4) 

for fc = 0, . . . , /i-i - 1 with 77,. := log(C2) - E[log(C2)]. 

We have found a nonparametric regression model with regression function 
log(o'^(») + e^/i^^7r^) and equidistant observations corrupted by non-Gaussian, 
but centered noise (77^) of variance 2. To ensure that the regression function does 
not change under the asymptotics e — >■ 0, we specify the block size h = h{e) = h^e 
with some fixed constant ho > 0. 
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It is not surprising that the nonparametric regression experiment in (|3.4p is 
equivalent to a corresponding Gaussian shift experiment. Indeed, this follows read- 
ily from results by Grama and Nussbaum (2002) who in their Section 4.2 derive 
asymptotic equivalence already for our Gaussian scale model (|3.3p . Note, however, 
that their Fisher information should be I{-d) — ^i?^^ and we thus have asymptotic 
equivalence of (j3.3p with the Gaussian regression model 

Wk ^ ^\og(a^(kh) + ho^Tr^)+jk, fc = 0, . . . , /i^^ - 1, 

where 7fe ^ ^(0, 1) i.i.d. Since by the classical result of Brown and Low (1996) 
the Gaussian regression is equivalent to the corresponding white noise experiment 
(note that log(o-'^(.) + /ig ^tt^) is also a- Holder continuous), we have already derived 
an important and far-reaching result. 

3.3 Theorem. For a > 1/2 and > the high frequency experiment 
S'i{e,a, R,a^) is asymptotically more informative than the Gaussian shift exper- 
iment 'i^i{e,a, R,a^,ho) of observing 

dZt = log (a'^it) + V dt + hy'^e^''^dWt, t G [0, 1]. 

Here ho > is an arbitrary constant and G Jy^{a, R,g_^). 

3.4 Remark. Moving the constants from the diffusion to the drift part, the exper- 
iment is equivalent to observing 

dZt ^ {2ho)~^/^\og{a^{t) + hQ^TT'^)dt + e^/'^dWt, t e [0,1]. (3.5) 

The Gaussian shift experiment is nonlinear in cr^ which is to be expected. Writing 
e = S/^/n gives us the noise level which appears in all previous work on 

the model Sq. 

To quantify the amount of information we have lost, let us study the LAN- 
property of the constant parametric case a'^{t) — > in We consider the 
local alternatives a1 ^ + e^^"^ for which we obtain the Fisher information Ih„ = 
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(2/io) ^/iQ/(7r^ + /iqCTq)^. Maximizing over ho yields Iiq = \/3tt<7q ^ and the Fisher 
information is at most equal to 

sup ho = cr(7^33/V(327r) w 0.0517cr~^ 

ha>0 

By the LAN-result of Gloter and Jacod (2001a) for S'o the best value is I {(To) = ^ctq^ 
which is clearly larger. Note, however, that the relative (normalized) efficiency is 



already ^^S^^ w 0.64, which means that we attain about 64% of the precision 
when working with instead of S'o or Si . 



4 A close sequence of simple models 

In order to decrease the information loss in ^i, we now take into account higher 
frequencies in each block [kh, (fc + ^)h]. In a frequency-location notation (j, fc) we 
consider for fc = 0, 1, . . . , h^^ — 1. j ^ 1 

^jkit) = ^h-^'^ cos{jn{t ~ kh)/h)lykh,(k+m{t): te [0,1]. (4.1) 

This gives the corresponding antiderivatives 

= ^^sm{jn{t - kh)/h)l[kh,{k+i}h]{t), t e [0,1]. 

Not only the (ifjk) and (^jk) are localized on each block, also each single family 
of functions is orthogonal in L^([0,1]). Working again on the piecewise constant 
experiment S2, we extract the observations 

Vjk J fjkit) dYt = (^h^TT-^j-^a^{kh) + e^) '^\jk,J ^ 1, k ^ 0, . . . ,h-^ - 1, 

(4.2) 

with Qk ^ N{0,1) independent over all {j,k). The same transformation as before 
leads for each j ^ 1 to the regression model for k — 0, . . . , h^^ ~ 1 

Zjk ■.= \og{v%)~\og{en-^r^)-nog{(:%)]^\og{a\t)+e^h-^iT^f)+T^,k. (4.3) 

Applying the asymptotic equivalence result by Grama and Nussbaum (2002) for 
each independent level j separately, we immediately generalize Theorem 13.31 
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4.1 Theorem. For a > 1/2 and > the high frequency experiment 
S'i{e,a,R,a^) is asymptotically more informative than the combined experiment 
'^2{e, Ci, R, ^,ho, J) of independent Gaussian shifts 

dZi = ^ log{a^{t) + ho\'f) dt + hl'\^/HWi, t e [0, 1], i = 1, . . . , J, 

with independent Brownian motions (W^-')j=i,...,j and G S^{a,R,a^). The con- 
stants ho > and J € N are arbitrary, but fixed. 

4.2 Remark. Let us again study the LAN-property of the constant parametric 
case cr^ (t) = > for the local alternatives af = ctq + s^^^ . We obtain the Fisher 
information 

In the limit J — )• oo and ho ^ oo we obtain by Riemann sum approximation 

lim lim //,„ i = I , „ „ ^ttt = — 

This is exactly the optimal Fisher information, obtained by Gloter and Jacod 
(2001a) in this case. Note, however, that it is not at all obvious that we may let 
J, ho oo, in the asymptotic equivalence result. Moreover, in our theory the re- 
striction h" = o{e^/'^) is necessary, which translates into ho = o(e(^^^")/^"). Still, 
the positive aspect is that we can come as close as we wish to an asymptotically 
almost equivalent, but much simpler model. 

5 Localisation 

We know from standard regression theory (Stone 1982) that in the experiment 
we can estimate cr^ e C" in sup-norm with rate (elog(£~^))"/(^"+^\ using that the 
log- function is a C°°-diffeomorphism for arguments bounded away from zero and 
infinity. Since is for a > 1/2 asymptotically more informative than ^i, we can 
therefore localize in a neighbourhood of some ctq • U^ing the local coordinate 
in = aQ+Ves'^ for i>e — >• we define a localized experiment, cf. Nussbaum (1996). 
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5.1 Definition. Let Si^ioc — <S'i.ioc{crQ,£,Ci,R,o^) for (To G ^(a,i?,CT^) be the sta- 
tistical subexperiment obtained from (ai{e,a, R,a^) by restricting to the parameters 
a'^ — al + v^s^ with — £"/(2"+^) log(e^-^) and unknown G C°'{R). 

We shall consider the observations (yjk) in (|4.2p derived from (S2,ioc and mul- 
tiplied by Trj/h. The model is then a generalized nonparametric regression family 
in the sense of Grama and Nussbaum (2002). On the sequence space (J^,^) — 
(R^, <B'^'^) we consider for ?9 e 9 = [a^, R] the Gaussian product measure 

Fi)^(^N{0,§ + h-^7r^f). (5.1) 

The parameter plays the role of (j'^{kh) for each k. By independence and the 
result for the one-dimensional Gaussian scale model, the Fisher information for i) 
is given by 

(5.2) 

where the series is evaluated in Section 19.61 using Fourier analysis. Since we shall 
later let /iq tend to infinity, an essential point is the asymptotics !{'&) ~ /iq. 

We split our observation design {kh | fc = 0, . . . , h^^} into blocks Am = {kh \ k = 
(m — 1)£, . . . , mi — 1}, m = 1, . . . , {th)~^ , of length I such that the radius of 
our nonparametric local neighbourhood has the order of the parametric noise level 
(/(i9)^)-i/2 in each block: 

V, ~ (/(i9)^)-i/2 ^ ^ _ h^^vj^. 

For later convenience we consider odd and even indices k separately, assuming 
that h^^ and i are even integers. This way, for each block m observing (jjjkT^j/h) for 
j ^ 1 and k e Am, k odd respectively k even, can be modeled by the experiments 



jpodd 



(^^./2^^«./2^( (g) KliU/nn..s^Wn)) (5-3) 
ki^Arn odd " 



k^Am even 
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where all parameters are the same as for (0*2, /oc- Using the nonparametric local 
asymptotic theory developed by Grama and Nussbaum (2002) and the indepen- 
dence of the experiments {S'^m)m (resp. {S'^m")m), we are able to prove in Section 
19.41 the following asymptotic equivalence. 

5.2 Proposition. Assume a > 1/2, > and ho ~ e"^ with p E (0, 1 — (2q)^^) 
such that {2h)^^ G N. Then observing {yj,2fc+i \ j ^ l,k — 0, . . . , {2h)^^ — 1} in 
experiment (o2,;oc is asymptotically equivalent to the local Gaussian shift experiment 
^3.ioc of observing 

dYt = ^ 3,. ^ (1 - -^--X'\es\t)dt + {2eY/^dWt, te[0,l], (5.5) 

where the unknown and all parameters are the same as in S'2.ioc- The Le Cam 
distance tends to zero uniformly over the center of localisation ctq G ,5^{a,R,a^). 

The same asymptotic equivalence result holds true for observing {yj^2k \ j ^ 
1, fc = 0, . . . , (2/1)^^ — 1} in experiment (^2^100- 

Note that in this model, combining even and odd indices k, we can already infer 
the LAN-result by Gloter and Jacod (2001a), but we still face a second order term 
of order h^^v^ in the drift. This term is asymptotically negligible only if it is of 
smaller order than the noise level e^/^. To be able to choose ho sufficiently large, 
we have to require a larger Holder smoothness of the volatility. 

5.3 Corollary. Assume a > « 0.64, > and {2h)-^ e N. Then observ- 
ing {yj^2k+i \ j ^ 1, ~ 0, . . . , {2h)^^ — 1} in experiment (^2,100 is asymptotically 
equivalent to the local Gaussian shift experiment ^i^ioc of observing 

dYt = 3 v.s'^it) dt + {2eY'^dWt, ie[0,l], (5.6) 
vSctq [t) 

where the unknown and all parameters are the same as in '^2,ioc- The Le Cam 
distance tends to zero uniformly over the center of localisation dg e J^{a, R,a^). 

The same asymptotic equivalence result holds true for observing {yj,2fe | j ^ 
1, fc = 0, . . . , {2h)^^ — 1} in experiment S2^ioc- 
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Proof. For a > ^^"f^ the choice of ho = e^^ for some p e ij^^^, possible 
and ensures that h" — o{e^/^) holds as well as h^^ — o{v^'^e). Therefore the 
Kullback-Leibler divergence between the observations in J^g""^ and in '^^1°'^ evaluates 
by the Cameron-Martin (or Girsanov) formula to 

£-1 /' tAttI ( 1 A^V^' - A^^es\t) dt < e~^hlvl 

Jo 8al{t)\\ ao{t)hoJ ) e y) ^ O e 

Consequently, the Kullback-Leibler and thus also the total variation distance tends 
to zero. □ 

In a last step we find local experiments ^5,;oc, which are asymptotically equiv- 
alent to ^^4,;oc and do not depend on the center of localisation ctq. To this end we 
use a variance-stabilizing transform, based on the Taylor expansion 

V2x^/^ = + - xo) + 0{{x - xof) 

which holds uniformly over x,Xo on any compact subset of (0, 00). Inserting x = 
(j'^{t) = (Toit) + Ves'^{t) and xq = ctq from our local model, we obtain 

VM*) = V^aoit) + ■j^a-''/\t)v,s^{t) + 0(z;2). (5.7) 

Since = o(e^/^) holds for a > 1/2, we can add the uninformative signal 
\/2ao^^{t) to Y in 5^4. /oc, replace the drift by \/2a^^'^{t) and still keep convergence 
of the total variation distance, compare the preceding proof. Consequently, from 
Corollary [531 we obtain the following result. 

5.4 Corollary. Assume a > w 0.64, > and (2/1)"^ G N. Then observ- 

ing {yj^2k+i \ j ^ 1, fc = 0, . . . , {2h)^^ — 1} in the experiment S2.I0C is asymptotically 
equivalent to the local Gaussian shift experiment ^^^loc of observing 

dYt = y/2a{t) dt + (2e)i/2 dWt, t £ [0,1], (5.8) 

where the unknown is a'^ = cr^ + v^s"^ and all parameters are the same as in (o2.ioc- 
The he Cam distance tends to zero uniformly over the center of localisation G 
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The same asymptotic equivalence result holds true for observing {yj,2fe | j ^ 
1, A; = 0, ... , (2/i)~^ — 1} in experiment Si^ioc- 

6 Globalisation 

The globalisation now basically follows the usual route, first established by Nuss- 
baum (1996). Essential for us is to show that observing (yjk) for j > 1 is asymptot- 
ically sufficient in §i. Then we can split the white noise observation experiment Si 
into two independent sub-experiments obtained from (j/jt) for fc odd and fc even, 
respectively. Usually, a white noise experiment can be split into two independent 
subexperiments with the same drift and an increase by \/2 in the noise level. Here, 
however, this does not work since the two diffusions in the random drift remain the 
same and thus independence fails. 

Let us introduce the L^-normalized step functions 

(/3o,fc(i) := (2/i)"^/^(l[(fc-i)fc,fe/,](i) - l[fcfc,(fe+i)/,](i)), fc = 1, . . . , /i"^ - 1, 
<Po,o(i) :=/i-^/'l[o,/^](i). 

We obtain a normalized complete basis (</'j7£)j>o,o^fe^/i-i-i 

of i^([0,l]) such that 

observing Y in experiment Si is equivalent to observing 

yjk-= 'Pjk{t)dYt, j ^0, fc = 0,...,/i"^-l. 
Jo 

Calculating the Fourier series, we can express the tent function $o,fe with $q ^ = 
(fo,k and <I>o,fe(l) = as an L^-convergent series over the dilated sine functions ^jk 
and $j,fe-i, j ^ 1: 

*o,ft(i) = ^(-lF+''i>,,fc-iW + E*J'^W' k = l,...,h-^-l. (6.1) 
We also have <i?o,o(0 = '^J2j^i^j,o{t)- By partial integration, this implies (with 
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L^-convergence) 

where l3jk := {Vjk,X) 

for k ^ \ and similarly /?o,o = '^^j>i Pj,o- This means that the signal /Sq j. in 
yo,fc can be perfectly reconstructed from the signals in the yj^k-i: Ujk- For jointly 
Gaussian random variables we obtain the conditional law in S'2 

"^(^^■^'^^^■^-^(va;(^^^-^M^ 

Given the results by Stone (1982) and our less informative Gaussian shift ex- 
periment '^i for a > 1/2, > 0, there is an estimator ct^ based on (yi,fc)fc in S'2 
with 

lim ini P„2,{\\a^^-a^\\oo^Rve) = l, (6.2) 

where = £"/(2"+i) log(e"i) as in the definitions of the localized experiments. 

We can thus generate independent A^(0, l)-distributed random variables pjk to 
construct from {yjk)j^i,k 

Var,(y,fe)^^"'-+ Var,(y,0^/^ 
where the variance Var^ is the expression for Var where all unknown values a'^ikh) 
are replaced by the estimated values a'^{kh). From this we can generate artificial 
observations (yo,fc) such that the conditional law -Sf ((yo,fc)fc I {Pj,k)k) coincides with 
•^{{yo.k)k I (/3o,fc)fe), which is just a multivariate normal law with mean zero and 
tri-diagonal covariance matrix e^((iy9o,fc, Vo,k'))k,k' ■ 

In Section 19.51 we shall prove that the Hellinger distance between the families 
of centered Gaussian random variables := {yjk \ j ^ 0, k — 0, . . . , h^^ — 1} and 
# := {yo^k I fc = 0, . . . , /i^i - 1} U {y-jk \ j ^ I, k ^ 0, . . . , h-^ - 1} tends to zero, 
provided h^^v"^ — o{e), which is possible when a > ^^/^ with the choice ho = s~p 
for somcpe {^^^). 
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6.1 Proposition. Assume a > ~ 0.81, > and h^^ an even inte- 
ger. Then the experiment S'2 is asymptotically equivalent to the product experi- 
ment (02. odd ® 'S'2. even wherc S'2^odd is obtained from the observations {j/j,2fc+i | j ^ 
l,k = 0,...,{2h)^^ — 1} and S'2,even from the observations {yj^2k\i 1,^ = 
0, . . . , {2h)^^ — 1} in experiment §2- 

This key result permits to globalize the local result. In the sequel we always 
assume a > and > 0. We start with the asymptotic equivalence between 

(02 and (S'2.odd'^<S'2,even- Using again an estimator cr^ in (02, odd satisfying (16. 2p we can 
localize the second factor S'2,even around ct^ and therefore by Corollary 15.41 replace 
it by experiment ^^5,ioc, see Theorem 3.2 in Nussbaum (1996) for a formal proof. 
Since J^s^joc does not depend on the center ct^, we conclude that (S'2 is asymptotically 
equivalent to the product experiment S'2.odd®'^5 where has the same parameters 
as (?2 and is given by observing Y in (15. 8p . Now we use an estimator a1 in satis- 
fying (|6.2p . whose existence is ensured by Stone (1982), to localize (S2,odd- Corollary 
15.41 then allows again to replace the localized (02, odd-experiment by such that (§2 
is asymptotically equivalent to the product experiment ^^5(8)^^5. Finally, taking the 
mean of the independent observations (j5.8p in both factors, which is a sufficient 
statistics, (or, abstractly, due to identical likelihood processes) we see that ^5 (g) ^5 
is equivalent to the experiment % of observing dYt — ^2a{t) dt + ^/edWt, t e [0, 1]. 
Our final result then follows from the asymptotic equivalence between and A 
as well as between Si and S'2- 

6.2 Theorem. Assume a > ~ 0.81 and 5,a^,R > 0. Then the regression 
experiment (ob("-, (5, a, i?, g^) is for n — > 00 asymptotically equivalent to the Gaussian 
shift experiment 'i^a{5n''^^^ , a, R,g_^) of observing 

dYt^ ^/Mt)dt + 5^''^n-^l'^dWt, ^e[0,l], (6.3) 



for ct2 g S^{a,R,a^). 
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7 Discussion 

Our results show that inference for the volatihty in the high-frequency observa- 
tion model under microstructure noise (oq is asymptotically as difficult as in the 
well understood Gaussian shift model % • Remark that the constructions in Gloter 
and Jacod (2001a), Gloter and Jacod (2001b) rely on preliminary estimators at 
the boundary of suitable blocks, while we require supp$jfe = [kh, {k + l)h] to ob- 
tain independence among blocks. In this context Proposition 16 . 1 1 shows asymptotic 
sufhciency of observing only the pinned process Xt — X^h ~ ^"x^-'^Cc+i)^' 

t G [kh, (k + l)h], on each block due to / {at + /3)(pjk{t) dt — for j ^ 1, a, /3 G R. 
Naturally, the {^jk)j^i form exactly the eigenfunctions of the covariance operator 
of the Brownian bridge. 

It is interesting to note that both, model (lb and model J^o, are homogeneous 
in the sense that factors from the noise (i.e. the dWt-term) can be moved to the 
drift term and vice versa such that for example high volatility can counterbalance 
a high noise level 5 or a large observation distance 1/n. Another phenomenon is 
that observing (oq m-times independently, in particular with different realisations 
of the process X, is asymptotically as informative as observing (oq with as many 
observations: both experiments are asymptotically equivalent to dYt — ^J2a{t)dt + 
^1/2,51/2^-1/4^^^^ Similarly, by rescaling we can treat observations on intervals 
[0,r] with T > fixed: Observing Yi = X^j^^^ + £», i = l,...,n, in Sq with 
Xt = /J cr(s) dB,,te [0, T], is under the same conditions asymptotically equivalent 
to observing 

dF„ = y/2a{Tu) du + i^l'^T-^l^vr^^^ dWu, u € [0, 1], 
or equivalently, 

d% = ^J2a{v) du + 5^/^T^''^n~^''^ dW^, v € [0,T]. 

Concerning the various restrictions on the smoothness a of the volatility cr^, one 
might wonder whether the critical index is a = 1/2 in view of the classical asymp- 
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totic equivalence results (Brown and Low 1996, Nussbaum 1996). In our approach, 
we still face the second order term in ()5.5|) and using the localized results, a much 
easier globalisation yields for a > 1/2 only that (Sq is asymptotically not less infor- 
mative than observing 

dYt ^ F{a^{t)) dt + 5^/^n-^/'^dWt, te [0,1], 

with F{x) = Jiiy^^^ — 2h'f^^y/'^y~^dy/^/8, which includes a small, but non- 
negligible second-order term since /iq cannot tend to infinity too quickly. 

On the other hand, it is quite easy to see that for a ^ 1/4 asymptotic equiva- 
lence fails. In the regression model (Jq with n observations we cannot distinguish be- 
tween X„(t) = J*anit)dBt with al{t) = 1 + n.-^/-^ cos{7rnt), \\crl\\ci/i = 2 + 
and standard Brownian motion (cr^ = 1) since X„(i/n) — X„((i — l)/n) ^ A^(0,l/n) 
i.i.d. holds. On the other hand, we have Jp^(-\/2a-„(t) — ^/2)^ dt ~ which 
shows that the signal to noise ratio in the Gaussian shift % is of order 1 and a 
Neyman-Pearson test between cr^^ and 1 can distinguish both signals with a posi- 
tive probability. This different behaviour for testing in S'q and implies that both 
models cannot be asymptotically equivalent for a — 1/4. Note that Gloter and Ja- 
cod (2001a) merely require a ^ 1/4 for their LAN-result, but our counterexample 
is excluded by their parametric setting. In conclusion, the behaviour in the zone 
a € (1/4, (1 -I- a/5)/4] remains unexplored. 

8 Applications 

Let us first consider the nonparametric problem of estimating the spot volatility 
(T^(i). From our asymptotic equivalence result in Theorem 16.21 we can deduce, at 
least for bounded loss functions, the usual nonparametric minimax rates, but with 
the number n of observations replaced by y/n provided ct^ G C" for a > (1 + \/5)/4 
as the mapping y/ (j{t) i-> a^{t) is a C°°-diffeomorphism for volatilities bounded 
away from zero. Since the results so far obtained only deal with rate results, it is even 
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simpler to use our less informative model or more concretely the observations 
iUk) in p.3|) which are independent in S'2, centered and of variance h'^ir"'^ a'^ (kh)+e'^ . 
With h = £ a. local (kernel or wavelet) averaging over e^'^n'^yl — tt^ therefore yields 
rate-optimal estimators for classical pointwise or L^-type loss functions. 
For later use we choose h — e in S'2 and propose the simple estimator 

k:\ke-t\^b 

for some bandwidth 6 > 0. Since is x^(l)-distributed, it is standard (Stone 1982) 
to show that with the choice b ~ (elog(e^^))^/^^"+^-' we have the sup-norm risk 
bound 

IE[||a,^--^llL]<(£log(£-^))^"/(^"+^), 

especially we shall need that is consistent in sup-norm loss. 

In terms of the regression experiment Sq we work (in an asymptotically equiv- 
alent way) with the linear interpolation Y' of the observations (Yi), see the proof 
of Theorem l2.2l By partial integration we can thus take for any j, k 

'Jo J{^-l)/n ' 

setting Yq '■= 0. Note that we have the uniform approximation y^^, — 
- + 0(/i-i/27i-i) due to \W,k\\oo ^ (2/i)-i/2. We see 
the relationship with the pre-averaging approach. The idea of using disjoint av- 
erages is present in Podolskij and Vetter (2009), where in our terminology Haar 
functions are used as They were aware of the fact that discretized sine func- 
tions would slightly increase the Fisher information (personal communication, see 
also their discussion after Corollary 2), but they have not used higher frequencies. 

Since we use the concrete coupling by linear interpolation to define y^j. in and 
since convergence in total variation is stronger than weak convergence, all asymp- 
totics for probabilities and weak convergence results for functionals F{{yjk)jk) in 
S'2 remain true for F{{y%)jk) in Sq, uniformly over the parameter class. The formal 
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argument for the latter is that whenever ||Pti — Q„||Ty — ^ and P^" P weakly 
for some random variables X„ we have for all bounded and continuous g 

^QMXn)]='KrAg{Xn)]+0{\\g\U\Fn-Qn\\Tv)^^M9{X)]. 



Thus, for a > 1/2, > and 6 ~ (n'^/^ logn)~i/(2a+i) ^j^g estimator 

5 _ 

n 

::|fe„-l/2_j|s 

satisfies in the regression experiment <^o 



^nit)--=^ E {nS-\\yir - n') (8.2) 



hm inf P,2,„(n"/(4"+2)(logn)-i||a^-a2||,,^ii!) = l. (8.3) 

n->-oo cr^e^{a,R,a^) 

The asymptotic equivalence can be applied to construct estimators for the inte- 
grated volatility a'^{t)dt or more generally p-th order integrals aP{t)dt using 
the approach developed by Ibragimov and Khas'minskii (1991) for white noise mod- 
els like In our notation their Theorem 7.1 yields an estimator of a'^{t)dt 
in % such that 



holds uniformly over cr^ e S^{a,R,q^) for any a,R,g^ > since the functional 
\^(t{») i-^ Jq a''{t)dt is smooth on L^. A LAN-result shows that asymptotic nor- 
mality with rate n~^/^ and variance 52p'^ a^P~^ (t) dt is minimax optimal. Spe- 
cializing to the ease p = 2 for integrated volatility, the asymptotic variance is 
86 cr'^(t) dt. It should be stressed here that the existing estimation procedures for 
integrated volatility are globally sub-optimal for our idealized model in the sense 
that their asymptotic variances involve the integrated quarticity cr*(t) dt which 
can at most yield optimal variance for constant values of ct^, because otherwise 
jQa^{t)dt > ( Jq^ cr^(f) rft)^^^ follows from Jensen's inequality. The fundamental 
reason is that all these estimators are based on quadratic forms of the increments 
depending on global tuning parameters, whereas optimizing weights locally permits 
to attain the above efficiency bound as we shall see. 
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Instead of following these more abstract approaches, we use our analysis to 
construct a simple estimator of the integrated volatility with optimal asymptotic 
variance. First we use the statistics (yjk) in ^2 and then transfer the results to Sq 
using (y^^) from (|8TT|) . 

On each block k we dispose in S'2 of independent A^(0, h?i^'^-K~'^a'^{kh) + e^)- 
observations yjk for j ^ 1. A maximum-likelihood estimator u'^(kh) in this expo- 
nential family satisfies the estimating equation 

where Wjk(cr ) := — ^ — ^ — ^ ^ — . (8.5) 

This can be solved numerically, yet it is a non-convex problem (personal communi- 
cation by J. Schmidt- Hieber). Classical MLE-theory, however, asserts for fixed ft-, k 
and consistent initial estimator {kh) that only one Newton step suffices to ensure 
asymptotic efficiency. Because of /i — !■ this immediate argument does not apply 
here, but still gives rise to the estimator 

IV. 'y.' hY.^^k{al)h-^fn\y% - e^) 

of the integrated volatility IV := <^'^{t) dt. Assuming the i°°-consistency ||fT^ — 
c^lloo ^ in probability for the initial estimator, we assert in S'2 the efficiency 
result 

e-^^^{rV,~ IV) ^ n(0,8 a^{t)dty 

To prove this, it suffices by Slutsky's lemma to show 

h-^-i 



(8.6) 



fc=0 j^l 



snp\wjk{al) - w,k{a^)\ < w,k{(T^)\\al - a^H^. (8.7) 

jk 

The second assertion (|8.7p follows from inserting the Lipschitz property that 
W{x) := {x + h-^TT^f)-^ satisfies \W\x)\ < W{x) and thus |W^(x) - W{y)\ < 
W{x)\x — y\ uniformly over x,y ^ > 0. 
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For the first assertion (I8.6P note that in £2 the estimator IV g is unbiased and 

such that by formula (|9.14p and Riemann sum approximation as /lo — > 00 (with 
arbitrary speed) 

£-iVar(/Fe) = V- ^ [\\t)dt. 

Due to the independence and Gaussianity of the (j/j/t) we deduce also 

^[{X.-^A<^')h-'3'^\vl-n)l\))\ <Var(5:z.,.(a^)/.-V.^(yf,-e^))' 

such that the central limit theorem under a Lyapounov condition with power p = 4 
(e.g. Shiryaev (1995)) proves assertion (|8.6p . assuming — > and /iq — > 00. A 
feasible estimator is obtained by neglecting frequencies larger than some J = J(e): 

fc=o i=i 



where zi;/^^^) := _r ' 'i; ■ (8.9) 



(a^(fc/.) + /.o^7ry)- ^ 



A simple calculation yields E[|/ye, < £(/io/J)^ such that for /iq/J -> 
convergence in probability implies again by Slutsky's lemma 



By the above argument, weak convergence results transfer from S2 to and we 
obtain the following result where we give a concrete choice of the initial estimator, 
the block size h and the spectral cut-off J (we just need some consistent estimator 
o-^, /i^"n^/^ — as weU as hin}/'^ -> 00 and J^^ = o{h^^n^^/'^)). 

8.1 Theorem. Let y^j. for j ^ 1, k — 0, h^^ — 1 be the statistics (|8.ip from model 
S'q. For h ~ n^^/^log(n) and J/log(n) — 00 consider the estimator of integrated 
volatility 

■■= "^Y^ hj^wUal)h-^f.\{y%f - 5n-') 

k=0 j = l 
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with weights wjf. from (j8.9p and the initial estimator from (|8.2p . Then IV n is 
asymptotically efficient in the sense that 

n^^'^ifVn- IV) ^ n(0,86 a^{t)dt^ as n ->■ oo, 

provided is strictly positive and a-Holder continuous with a > 1/2. 

This might serve as a benchmark for more general models, whereas we, in the 
spirit of Mykland (2009), focus on elucidating the underlying fundamental struc- 
tures. In particular, we should dispense with the Gaussianity of the microstructure 
noise (e;) as well as with the deterministic nature of the volatility a^. The analysis 
in both cases, however, cannot simply rely on model (^2, since Sq is non-Gaussian. 
Different tools are required. 

9 Appendix 

9.1 Gaussian measures, Hellinger distance and Hilbert- 
Schmidt norm 

We gather basic facts about cylindrical Gaussian measures, the Hellinger distance 
and their interplay. 

Formally, we realize the white noise experiments, as L^-indexed Gaussian vari- 
ables, e.g. in experiment (fi we observe for any / G ^^([0, 1]) 

Yf (/, dY) := f{t) ( j'^ a{s)dB{s)) + ^ 

Canonically, we thus define P"^'"^ on the set ~ M.^ ([o^i]) product Borel a- 

algebra ^ — <B^^'^['^'^1'' (realizing a cylindrical centered Gaussian measure). Its 
covariance structure is given by 

E[YfYg]^{Cf,g), f,geL\[0,l]), 

with the covariance operator C : i^([0, 1]) ^^([0, 1]) given by 

Cfit) = ( <j'is) ds)f{u) du + eV(t), / e L^([0, 1]). 
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Note that C is not trace class and thus does not define a Gaussian measure on 
L^{[0, 1]) itself. 

In the construction, it suffices to prescribe (i^em)™^! for an orthonormal basis 

(em)m^i and to set 

OO 

m=l 

This way, wc can define P'^''^ equivalcntly on the sequence space n = with 
product cr-algebra ^ = <B®'^. This is useful when extending results from finite 
dimensions. 

The Hellinger distance between two probability measures P and Q on {ft, ^) is 
defined as 

J 

where ji denotes a dominating measure, e.g. /x = P + Q, and p and q denote the 
respective densities. The total variation distance is smaller than the Hellinger dis- 
tance: 

||P-Q||rv<if(P,Q). (9.1) 

The identity iJ^ (P, Q) = 2 — 2 J ^^Jqdjji implies the bound for finite or countably 
infinite product measures 

if'((g)P„,(g)Q„) <l]i?'(P„,QJ. (9.2) 

n n n 

Moreover, the Hellinger distance is invariant under bi-measurable bijections T : 

— >• f2' since with the densities p o T~^, q o of the image measures P"^ and 
with respect to fi^ we have 

if2(P^,Qr)= [ (V^^^-y^^^)2d^^= [ {^-^fdii = H\F,Q). 
Jn' Jn 

(9.3) 

For the one- dimensional Gaussian laws A'^(0, 1) and N{0,a'^) we derive 
H^{N{0, 1), iV(0, a^)) = 2 - ^8a/{a^ + 1) ^ 2{a^ - if. 
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For the multi-dimensional Gaussian laws iV(0,Si) and 7V(0, E2) with invertible 
covariance matrices I]i,E2 G R''^'' we obtain by linear transformation and inde- 

]^/2 1/2 

pendence, denoting by Ai, Ad the eigenvalues of ^.2^1 : 

d 

H\N{0, Si), 7V(0, S2)) = H\N{0, ld),N{0, S-^/^S^S^^/')) ^ ^ 2(Afe - 1)^. 

k=l 

The last sum is nothing, but the squared Hilbert-Schmidt (or Frobenius norm) of 
E~^/^E2Sj;^^^ - Id such that 

i?2(7V(o,I]i),iV(0,I]2)) ^2||I]7^/'(S2-Si)E7'/'||?,s. (9.4) 

Observing that (|9.2p and (|9.3p also apply to Gaussian measures on the sequence 
space M^, the bound (|9.4p is also valid for (cylindrical) Gaussian measures A^(0, E^) 
with self-adjoint positive definite covariance operators Si : L^{[0, 1]) — > i^([0, 1]). 

The Hilbert-Schmidt norm of a linear operator A : H H on any separable 
real Hilbert space H can be expressed by its action on an orthonormal basis (e™) 
via 

m.n 

which for a matrix is just the usual Frobenius norm. For self-adjoint operators A, B 
with < \ {Bv,v) \ for all u G we use the eigenbasis (e™) of A and obtain 

Furthermore, it is straight-forward to see for any bounded operator T 

\\TA\\hs^\\T\\\\AUs. \\AT\\hs <.\\T\\\\A\\hs (9.6) 

with the usual operator norm ||r|| of T. Finally, for integral operators Kf{x) — 
k{x, y)f{y) dy on i^([0, 1]) it is well known that 

ll^ll//s = ||fc||L2([o,i]=). (9.7) 

For two Gaussian laws with different mean vectors /ii,/i2 and with the same 
invertible covariance matrix S we can similarly use the transformation E^^/^ and 
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the scalar case H'^{N{Tni,l), N{m2, 1)) = 2(1 - e"^™!"'"^^)'/^) s: (toi - m2)V4 to 
conclude by independence 

Combining (|9.4p and (|9.8p we obtain by the triangle inequality the bound 

H\N{^Ji,,^,),N{^Ji2,Y.2)) < \\Yr^^'\^l^-pi2)f^\Y.-^'\^2~T.^)T.-,^'YHS■ (9-9) 

9.2 Proof of Theorem [272] 

We first show that ifi is asymptotically at least as informative as (fo for e = (5/ 
and a > 0. From <§i with e = 5/ ^/n we can generate the observations (statistics) 

A2i+l)/2n i'(2i+l)/2n 

Yi := n / dYt — n Xtdt + ii, i = 1, . . . ,n — 1, 

J{2i-l)/2n J{2i-l)/2n 

Yn :== 2n f dYt = 2n / Xtdt + in, 

J(2n-l)/2n J(2n~l)/2n 

with e, = ne(H/(2j+i)/2„ - T^(2<-i)/2n) ^ N{0,6^) and similarly e„ - iV(0,^2), all 
independent. In contrast to standard equivalence proofs, it turns out to be essential 
here to take Yi as a mean symmetric around the point i/n. Since (Yi) and (Yi) are 
defined on the same sample space, using inequality (|9.ip it suffices to prove that 
the Hellinger distance between the law of (Yi) and the law of (Yi) tends to zero as 
n tends to infinity. 

For the integrated volatility function we introduce the notation 



ait) 



[ a^{s)ds, Os;t<l. 
Jo 



For notational convenience we also set a(l + s) := a(l — s) for s > 0. 

The covariance matrix of the centered Gaussian vector (Yi) is given by 

:= E[YkYi] = a{k/n) + S^l{k = 1), 1 fc / n. 

Similarly, the covariance matrix of the centered Gaussian vector (1^) is given 

by 

^(2fe+l)/2n 

:= E[YkYi] = n a{t) dt + S^l{k = I), 1 ^ A: ^ Z ^ n, 

"'(2fc-l)/2n 
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where for k = I = n we used the convention for a(l + s) above. We bound the 
HeUinger distance using consecutively (|9.4|) . ^ 5^ Id in (|9.5|) and (|9.2|) . a Taylor 
expansion for a and treating the case k — I = nhy a Lipschitz bound separately: 

H^{J^iY„i = 1, . . . , n),^{Y,,i = 1, . . . , n)) 



^(2fe+l)/2n 2 

^4$-^ V (n / (a(t) - a(/c/n))di) 

, " , .(2fe+l)/2n 2. 

< 4(5-'*(0(i?2n-2) +n V (n / (a'(A:/n)(t - fc/n) + 0(i?n-i-")) ) ) 

^ ^ J(2fe-l)/2n ^ ^ 



= 45-l(0(i?2n-2) + 0(i?2„2-2-2a)^ 
-4 d2„-2q 



0((S-*i?^n-""). 



Consequently, by (19. ip the total-variation and thus also the Le Cam distance be- 
tween the experiments of observing (Yi) and of observing (Yi) tends to zero for 
n — >■ cx), which proves that the white noise experiment A is asymptotically at least 
as informative as the regression experiment S'q. 

To show the converse, we build from the regression experiment S'q a contin- 
uous time observation by linear interpolation. To this end we introduce the lin- 
ear i?-splines (or hat functions) bi{t) — hit — i/n) with h{t) = min(l + nt.l — 
tn)\_i/n.i/n\[t) and set 



y/ := = ^X,/„6,(<) + Y.eMt), t e [0, 1]. 

i—l i—1 2—1 

Note that (Y"/) is a centered Gaussian process with covariance function 



— 1 i—l 
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For any / £ i^([0, 1]) we thus obtain 

n n 

ij — 1 i=l 

n 

< J2 a{{^^J)/n){fM){f,b,}+S^n-'\\f\f, 

because J nbi — 1 yields by Jensen's inequality {f,nbi)'^ ^ (f'^^nbi) and we have 
X^i ^ 1- This means that the covariance operator C induced by the kernel c is 
smaller than 

n 

in the sense that (7 — C is positive (semi-) definite. Now observe that C is the 
covariance operator of the white noise observations 

n J. 

dYt = y^x,/,A{t) + ^dWt, t e [0, 1]. (9.10) 

Hence, we can generate these observations from (y/) by randomisation, i.e. by 
adding uninformative A^(0, C— C')-noise to Y' . Now it is easy to see that observing 
Y in ()9.10p and Y from (ffi is asymptotically equivalent, since in terms of the respec- 
tive covariance operators, using again (|9.4p . (j9.5p and (|9.2p . the squared Hellinger 
distance satisfies 



H^{J^{Y), ^{Y)) ^ 2|| {C^r^/\C - C^){C'')-^'Yhs 

< 2(5" V / / (a(iAs)- V a{{ihj)/n)h{t)bj{s)) dtds 
Jo Jo ^ ,^^1 ^ 

= 25-V^ ^ ( ^ (a(i As)-a((i Aj)/n))6i(t)6j(s))^dtds, 



where for the last line we have used J^^^o ^ii^) — 1 ^^-^ a(0) = 0. Since 6i(t) 7^ 
can only hold when i — [nij e {0, 1}, the a-H61der regularity of implies for 
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t ^ s - 1/n: 

1 2 

= ( 5](«'(L'^*J/")(^-(fc+L"iJ)/'^) + 0(i?n-i-"))6fc+L„tj(i)&i+Ln.jGs)) 

1 2 

= 0(i?2n-2-2a) + (a'(LntJ/n)E(t- (fc+ LntJ)/n)6fe+L„tJ W) 

A symmetric argument gives the same bound for s ^ t — 1/n. For \t — s\ < 1/n we 
use only the Lipschitz continuity of a to obtain the bound 0{R^n^^). Altogether 
we have found 

SC 25-^n^{o{R^n-^-^'^)+n-^0[R'7i-'^)^ = 0{5-^R^n-^'^), 

which together with the transformation in the other direction shows that the Le 
Cam distance between (Sq and (Si is of order 0{S^^Rn^°'). 

9.3 Proof of Proposition 

The main tool is Proposition 19.11 below. Together with the Holder bound 

|a2([sJ;,.)-a2(s)Ki?/z", se[0,l], 

it implies that for fixed a the observation laws in (oi and (02 have a Hellinger distance 
of order Rh^a^^/^e^^^^ . By inequality (|9.ip this translates to the total variation 
and thus to the Le Cam distance. 

9.1 Proposition. For e > and continuous a : [0, 1] — > (0,cx)) consider the law 
P'^'^ generated by 

dYt^i^j^ a{s)dB{s)^ dt + sdWt, ie[0,l], 
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with independent Brownian motions B and W . Then the Hellinger distance between 
two lawsf'"^''^ andf'"^''^ satisfies 

Proof. The covariance operator Ca of P"^'^ is for f.g^ -^^([Oj 1]) given by 

{CJ,g) ^n{f,dY){g,dY)]=n{LX){g,X)]+e^f,g) ^ j FGa' + J fg. 

For covariance operators corresponding to ci, a2 we have with F(t) — — f(s)ds 
by twofold partial integration 

= / / / {al-ai){u)duf{t)f{s)dsdt 
Jo Jo Jo 

1 

2/„2 2\ 







F{uy{af -(7^){u)du 

Wl-'^lWoc I F{ufdu^\\al-al\\^{CBMfJ) 
Jo 

with CsAigit) J^{t A s)g(s)ds, the covariance operator of standard Brownian 
motion. Using further the ordering C^i ^ mint cr J (i)C_BA/ + Id and (|9.5p . (|9.2p 
we obtain 

^ Iki - (^2\\oo\\c;;^^/^CBMC;;y^\\Hs 

< Ik? - a||loo||(mm a?(t)CBM +£'ld)-i/2CBM(mina2(t)CsM +£'ld)-i/2||^^ 

= hl-^l\\oo\\F{CBM)\\HS, 

employing fmrctional calculus with F{x) = (mint (7j(t)a; + £^)^^x. The spectral 
properties of Cbm imply that F{Cbm) has eigenfunctions ek{t) = \/2sa\{TT(k — 
l/2)t), fc ^ 1, with eigenvalues At = -; — ■. ii/^x f',^, — , o . whence its Hilbert- 

I I 1^ ^ ^ o K. 4mmt a-j(t) + (2/c— Ij'^Tr'^e'' ' 

Schmidt norm is of order maxt cr7 ^(t)e"i/2. This yields the result. □ 
9.4 Proof of Proposition 15.21 

We only consider the case of odd indices A:, both cases are treated analogously. 
The result of Theorem 6.1 in conjunction with Theorem 5.2 of Grama and Nuss- 
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baum (2002) establishes that 1^3°^^ and the Gaussian regression experiment of 
observing 

Yk = v.s^kh) + /(a2(fc/i))-i/27fe, k e A„, odd, 7^ - N{0, 1) i.i.d. (9.11) 

are equivalent to experiments S'^^m — A^s^)s^eCa{R)) ^-^^d ^3^m = 

{'3^,'^, (Q™)s2ec„(/?))i respectively, on the same space (^^,^) such that 

sup ij2(p™,Q™) <r^'' (9.12) 

holds for all p < 1. 

To be precise, it must be checked that the regularity conditions Rl — i?3 of 
Grama and Nussbaum (2002) are satisfied for all values 6. One complication is 
that in our parametric model the probabilities Pij and the Fisher information /(i?) 
depend on ho which tends to infinity. Yet, inspecting the proofs it becomes clear 
that the results remain valid if (a) the conditions Rl — R3 hold for varying models, 
but with uniform constants and (b) the Fisher information is renormalizcd by the 
localisation such that the parametric rate (in our block length notation) is 

attained. From the fact that is the product of one-dimensional exponential family 
models we easily check condition Rl for 6 = 1 and condition R2 for any S > 0. Both 
conditions hold uniformly over ho once the score I has been renormalizcd through 
multiplication by /iq . In ()5.2p we have already calculated the Fisher information 
and we infer directly condition R3 that hg^Il-d) is uniformly bounded away from 
zero and infinity. We thus infer (|9.12p . 

In view of the independence among the experiments (i^^m )m and equally among 
the experiments ($^3.m)m we infer from (|9.12p and (|9.2p 

sup i?^(®!fjrp2,®^tV 'C) < m-h-'" < e-'vX'vtr 

Since we assume ho = o{e^^~^°'^^^°'), the right-hand side tends to zero provided 
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holds. Since p < 1 is arbitrary, this is always satisfied for a < 1. In the case a = 1 
we use /iQ < e^"^ for some p < 1/2. We have derived asymptotic equivalence between 
the product experiments ®m<S'l°^ and ®rn^d.,m- A fortiori, applying the Brown and 
Low (1996) result, this leads to asymptotic equivalence between observing {yjk) in 
experiments (o2.ioc and the corresponding Gaussian shift models of observing 

dYt = I{(jl{t)f/^v,s^{t) dt + {2hf/^dWt, t e [0, 1]. (9.13) 



From the explicit form (j5.2p of the Fisher information we infer for ho ^ oo 



ho ' 4 2'dy^ho 

Consequently, by the polynomial growth of /iq in e^^, the KuUback-Leibler diver- 
gence between the observation laws from (|9.13p and the model ^^3^;oc converges to 
zero. This gives the result. 



9.5 Proof of Proposition 16.11 

Since the observations yjk for j ^ 1 are the same in "3^ and we can work 
conditionally on those. Moreover, it suffices to consider only the event il^ 
{||f7^ — cr^ I loo ^ Rve} because the squared Hellinger distance satisfies (with ob- 
vious notation) 

H'{^m,J^{?V)) - E[H^{^{{yok)k I {yjk)j^i.k),.^{{yok)k \ {yjk)j^i.k))] 
^ E[H^^{iyok)k I {yjk)j^i,k),.^iiyok)k I {yjk)j^uk))lnA + 2V{n^^) 

with P(r2^) — > 0. Conditional on {yjk)j^i.k, both laws are Gaussian, (yo,fe)fe has 
mean fj, with 

^ Var(/3jfc) ^Var(/3^. . 
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and covariance matrix E with 

r, £^Var(^jfc) ^2 if I,' — I, 



Sfc fc' — < 



Var(i/jfc) 



0, otherwise, 

where Cfe 1 V (2 — fc) G {1,2}. Conditional mean ji and covariance matrix E of 
iyak)k have the same representation, but replacing Var each time by Var^. 
From the tri-diagonal structure of S and from 
^ Var(/3,,) ^ jho/j)' 

we infer E > (e^/io + e^)Id ^ eft, Id in matrix order. Combining this with the 
Hellinger bound (|9.9I) we arrive at the estimate 



r, 1 1 2 



2 



£2/i2 

'£2var(/3jfe) e2yj^^^(^^.^),2 



< / Var(^j-fc) Varg(/3-,-fc) \ 2 Var(yjfc) ^ /£ 



e 



^■^"T^i^ ^Var(yjfe) Y&r^{yjk)J eh ^^^\ Var(yjfe) Vare(yj7c) 

The function G(z) := wl'^^^ltle- ^as derivative G'(z) = (||i^'^]|''r+'.). and thus 
satisfies uniformly over all z bounded away from zero \G(vj) — G{z)\ < ^l?^^ n-i^r-iyi ■ 

\ II ^ jk II I ^ / 

Inserting ~ ctqI ^ and ||$jfe|| ~ ft/j, we thus find the uniform boimd on fig 
Putting the estimates together, we arrive at 

2«2ft-i^min(fto/j,j7/io)'/^o' 

such that the Hellinger distance tends to zero uniformly if h^^v'^ — o(e), which 
is ensured by our choice of ho- This implies asymptotic equivalence of observing 
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'3^ and '3^ and thus of experiment S2 and of just observing {yjk)j^i,k in S'2. By 
independence the latter is equivalent to (^2,odd €5 S'2,even- 

9.6 An explicit series representation 

We aim at deriving the formula 

(A2 +7r2j2)2 4(i_e-2A)2 2A ^' ' 

for any A > 0. We employ Fourier techniques and consider the Fourier coefficients 

of gix) ^ er^-^h: 

, ("^"^ ttM - p-2AN 

Jo \Jl-K\\ — m]) 

For the 27r-periodic convolution g * g{x) — xe^''^^!^ + (27r — j;)e^^(2+2;/7r) obtain 
the Fourier coefficient as a product: 

^ V2^(A - i7rj)2 ■ 

The Parseval formula therefore yields 

g (A2 +l2,2)2 = ,3(1 _V2A)4 g"^'^^^^')"' ^ ^3(TZ^-2X)4 1 1 5 * 5 1 1 L ■ 

We infer that (^)3||g * 5||2 ^ equals 

' ((1 _ e-2A)2^2 ^ 4,2g-4A ^ 4,(g-2A ^ g-4A)^^ e'^'^^l^dx 

= (Ae-2A + (l-e-4^)/4)(l-e-2A)2 



and thus obtain 



f TT 

Using the symmetry in j', we establish (j9.14p 
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